- How we have used R Markdown in our undergraduate courses
- Reproducibility with R Markdown
- And some things that we think are just cool in Rmd.
- BIG Thank you to Project TIER and the Alfred P. Sloan Foundation
CTREE, 1 June 2017
tidyverse, mosaic, stargazerdyndoc (Stata 15)?reticulated Python)Michael O'Hara:
Senior thesis seminar with 9 students
Advantages:
Cost
Courtesy of Bray, 2016
The Good
The Bad
The Ugly?
Courtesy of Bray, 2016
The good
The bad
The ugly?
Raw Markdown
Knitted Markdown
# Header 1 ## Header 2 ### Header 3 This is normal sized text used in the body of our work. For bullet points, we use dashes, e.g. - Intro to RStudio - More content - a sub-point - Back to the original level
R Markdown can produce a variety of document types (other than the default html page):
pdf_document makes a PDF with LaTeX (.pdf)
word_document for Microsoft Word documents (.docx).
odt_document for OpenDocument Text documents (.odt).
rtf_document for Rich Text Format documents (.rtf)
And others.
R Markdown can also be re-purposed to produce a presentation file (as with this presentation):
io_slides opens in your browser and interactive (.html)
slidy another browser based presentation format (.html)
beamer makes a PDF with LaTeX (.pdf)
Think about data analysis as falling into three loose categories:
All of this occurs in the code "chunk"
To open a code chunk hit CMD + OPTION + I on a Mac
Or type out three backticks ``` folowed by {r}
And then three more back ticks ``` on another line.
Within the {r} you can specify options, like {eval = FALSE} if you don't want it to evaluate the code
Or you can label the code chunk, e.g. {r cars} labels the chunk "cars" in your ToC
```{r cars, echo = TRUE}
summary(cars)
```
The option echo = TRUE means that the code gets included in the rendered html.
summary(cars)
## speed dist ## Min. : 4.0 Min. : 2.00 ## 1st Qu.:12.0 1st Qu.: 26.00 ## Median :15.0 Median : 36.00 ## Mean :15.4 Mean : 42.98 ## 3rd Qu.:19.0 3rd Qu.: 56.00 ## Max. :25.0 Max. :120.00
## session subject r1 r2 r3 r4 r5 r6 r7 r8 r9 treatment team ## 1 1 1 0 0 0 0 10 10 0 0 0 individual NA ## 2 1 2 0 0 30 40 40 0 0 0 20 individual NA ## 3 1 3 30 30 0 0 0 60 60 10 0 individual NA ## 4 1 4 20 0 100 0 0 30 75 100 100 individual NA ## 5 1 5 100 100 100 100 100 100 100 100 100 individual NA ## 6 1 6 100 100 100 100 100 100 100 0 0 individual NA ## uniqid ## 1 1_individual_1 ## 2 1_individual_2 ## 3 1_individual_3 ## 4 1_individual_4 ## 5 1_individual_5 ## 6 1_individual_6
## ## Wilcoxon rank sum test with continuity correction ## ## data: value by treatment ## W = 52876, p-value = 3.838e-10 ## alternative hypothesis: true location shift is not equal to 0
## ## Call: ## lm(formula = value ~ treatment, data = SutNarrow) ## ## Residuals: ## Min 1Q Median 3Q Max ## -61.370 -29.385 -0.542 38.630 60.615 ## ## Coefficients: ## Estimate Std. Error t value Pr(>|t|) ## (Intercept) 39.385 1.451 27.152 < 2e-16 *** ## treatmentmessage 21.985 1.994 11.028 < 2e-16 *** ## treatmentmixed 10.609 1.925 5.510 3.92e-08 *** ## treatmentpaycomm 10.886 2.144 5.077 4.09e-07 *** ## treatmentteamtreat 16.313 2.629 6.204 6.34e-10 *** ## --- ## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1 ## ## Residual standard error: 34.81 on 2713 degrees of freedom ## Multiple R-squared: 0.04473, Adjusted R-squared: 0.04333 ## F-statistic: 31.76 on 4 and 2713 DF, p-value: < 2.2e-16
## Oneway (time) effect Random Effect Model
## (Swamy-Arora's transformation)
##
## Call:
## plm(formula = value ~ treatment, data = SutNarrow, effect = "time",
## model = "random", index = c("uniqid"))
##
## Balanced Panel: n = 302, T = 9, N = 2718
##
## Effects:
## var std.dev share
## idiosyncratic 1197.631 34.607 0.987
## time 16.062 4.008 0.013
## theta: 0.555
##
## Residuals:
## Min. 1st Qu. Median 3rd Qu. Max.
## -61.9325 -28.6639 -2.5889 33.7276 64.3014
##
## Coefficients:
## Estimate Std. Error t-value Pr(>|t|)
## (Intercept) 39.3854 1.9657 20.0367 < 2.2e-16 ***
## treatmentmessage 21.9850 1.9818 11.0936 < 2.2e-16 ***
## treatmentmixed 10.6093 1.9140 5.5430 3.259e-08 ***
## treatmentpaycomm 10.8862 2.1315 5.1072 3.497e-07 ***
## treatmentteamtreat 16.3130 2.6138 6.2412 5.022e-10 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Total Sum of Squares: 3403100
## Residual Sum of Squares: 3249200
## R-Squared: 0.045244
## Adj. R-Squared: 0.043836
## F-statistic: 32.1409 on 4 and 2713 DF, p-value: < 2.22e-16
| Dependent variable: | |
| value | |
| treatmentmessage | 21.985*** |
| (1.994) | |
| treatmentmixed | 10.609*** |
| (1.925) | |
| treatmentpaycomm | 10.886*** |
| (2.144) | |
| treatmentteamtreat | 16.313*** |
| (2.629) | |
| Constant | 39.385*** |
| (1.451) | |
| Observations | 2,718 |
| R2 | 0.045 |
How about Bayes' Rule?
\[Pr(\mbox{Outcome} | \mbox{signal}) = \frac{\theta p}{\theta p - (1 - \theta)(1 - p)}\]
R Markdown uses \(\LaTeX\) for math and it immediately gets displayed in R Studio.
That is, \(\LaTeX\) without the challenges of learning the packages, tables, etc that makes learning \(\LaTeX\) so hard.
In-line equations are bracketed by single dollar signs $.
Off-set equations are bracketed by double dollar signs $$.
R Markdown and R Studio together have excellent capabilities.
Michael:
Aaron:
Students have to adjust to get Basics Right
Students like (or are used to) WYSIWYG, which Rmd is not - Students are accustomed to MS Word & G docs which are WYSIWYG, but Rmd is not.
Installing packages
Server = REALLY GREAT!
New to R?
We took inspiration from many people along the way. We thank them and ask their forgiveness.